On evaluating brain tissue classifiers without a ground truth.
نویسندگان
چکیده
In this paper, we present a set of techniques for the evaluation of brain tissue classifiers on a large data set of MR images of the head. Due to the difficulty of establishing a gold standard for this type of data, we focus our attention on methods which do not require a ground truth, but instead rely on a common agreement principle. Three different techniques are presented: the Williams' index, a measure of common agreement; STAPLE, an Expectation Maximization algorithm which simultaneously estimates performance parameters and constructs an estimated reference standard; and Multidimensional Scaling, a visualization technique to explore similarity data. We apply these different evaluation methodologies to a set of eleven different segmentation algorithms on forty MR images. We then validate our evaluation pipeline by building a ground truth based on human expert tracings. The evaluations with and without a ground truth are compared. Our findings show that comparing classifiers without a gold standard can provide a lot of interesting information. In particular, outliers can be easily detected, strongly consistent or highly variable techniques can be readily discriminated, and the overall similarity between different techniques can be assessed. On the other hand, we also find that some information present in the expert segmentations is not captured by the automatic classifiers, suggesting that common agreement alone may not be sufficient for a precise performance evaluation of brain tissue classifiers.
منابع مشابه
Evaluating Automatic Brain Tissue Classifiers
We present a quantitative evaluation of MR brain images segmentation. Five classifiers were tested. The task was to classify an MR image into four different classes: background, cortical spinal fluid, gray matter and white matter. The performance was rated by first estimating a ground truth (EGT) using STAPLE and then analyzing the volume differences as well as the Dice similarity measure betwe...
متن کاملEffect of Errors in Ground Truth on Classification Accuracy
The effect of errors in ground truth on the estimated thematic accuracy of a classifier is considered. A relationship is derived between the true accuracy of a classifier relative to ground truth without errors, the actual accuracy of the ground truth used, and the measured accuracy of the classifier as a function of the number of classes. We show that if the accuracy of the ground truth is kno...
متن کاملEvaluating Classifiers Without Expert Labels
This paper considers the challenge of evaluating a set of classifiers, as done in shared task evaluations like the KDD Cup or NIST TREC, without expert labels. While expert labels provide the traditional cornerstone for evaluating statistical learners, limited or expensive access to experts represents a practical bottleneck. Instead, we seek methodology for estimating performance of the classif...
متن کاملTwo Methods for Validating Brain Tissue Classifiers
In this paper, we present an evaluation of seven automatic brain tissue classifiers based on level of agreements. A number of agreement measures are explained, and we show how they can be used to compare different segmentation techniques. We use the Simultaneous Truth and Performance Level Estimation (STAPLE) of Warfield et al. but also introduce a novel evaluation technique based on the Willia...
متن کاملExploiting Semantic Relatedness Measures for Multi-label Classifier Evaluation
In the multi-label classification setting, documents can be labelled with a number of concepts (instead of just one). Evaluating the performance of classifiers in this scenario is often as simple as measuring the percentage of correctly assigned concepts. Classifiers that do not retrieve a single concept existing in the ground truth annotation are all considered equally poor. However, some clas...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- NeuroImage
دوره 36 4 شماره
صفحات -
تاریخ انتشار 2007